Search CORE

6 research outputs found

Look and Modify: Modification Networks for Image Captioning

Author: Elsayed Mahmoud
Sammani Fawaz
Publication venue
Publication date: 01/01/2020
Field of study

Attention-based neural encoder-decoder frameworks have been widely used for image captioning. Many of these frameworks deploy their full focus on generating the caption from scratch by relying solely on the image features or the object detection regional features. In this paper, we introduce a novel framework that learns to modify existing captions from a given framework by modeling the residual information, where at each timestep the model learns what to keep, remove or add to the existing caption allowing the model to fully focus on "what to modify" rather than on "what to predict". We evaluate our method on the COCO dataset, trained on top of several image captioning frameworks and show that our model successfully modifies captions yielding better ones with better evaluation scores.Comment: Published in BMVC 201

arXiv.org e-Print Archive

SHDL@MMU Digital Repository

Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

Author: Deligiannis Nikos
Sammani Fawaz
Publication venue
Publication date: 17/08/2023
Field of study

Natural Language Explanations (NLE) aim at supplementing the prediction of a model with human-friendly natural text. Existing NLE approaches involve training separate models for each downstream task. In this work, we propose Uni-NLX, a unified framework that consolidates all NLE tasks into a single and compact multi-task model using a unified training objective of text generation. Additionally, we introduce two new NLE datasets: 1) ImageNetX, a dataset of 144K samples for explaining ImageNet categories, and 2) VQA-ParaX, a dataset of 123K samples for explaining the task of Visual Question Answering (VQA). Both datasets are derived leveraging large language models (LLMs). By training on the 1M combined NLE samples, our single unified framework is capable of simultaneously performing seven NLE tasks including VQA, visual recognition and visual reasoning tasks with 7X fewer parameters, demonstrating comparable performance to the independent task-specific models in previous approaches, and in certain tasks even outperforming them. Code is at https://github.com/fawazsammani/uni-nlxComment: Accepted to ICCVW 202

arXiv.org e-Print Archive

Show, Edit and Tell: A Framework for Editing Image Captions

Author: Melas-Kyriazi Luke
Sammani Fawaz
Publication venue
Publication date: 06/03/2020
Field of study

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state-of-art performance on the MS COCO dataset both with and without sequence-level training.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Crossref

SHDL@MMU Digital Repository

Visualizing and Understanding Contrastive Learning

Author: Deligiannis Nikos
Joukovsky Boris
Sammani Fawaz
Publication venue
Publication date: 23/04/2023
Field of study

Contrastive learning has revolutionized the field of computer vision, learning rich representations from unlabeled data, which generalize well to diverse vision tasks. Consequently, it has become increasingly important to explain these approaches and understand their inner workings mechanisms. Given that contrastive models are trained with interdependent and interacting inputs and aim to learn invariance through data augmentation, the existing methods for explaining single-image systems (e.g., image classification models) are inadequate as they fail to account for these factors. Additionally, there is a lack of evaluation metrics designed to assess pairs of explanations, and no analytical studies have been conducted to investigate the effectiveness of different techniques used to explaining contrastive learning. In this work, we design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images. We further adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations and evaluate our proposed methods with these metrics. Finally, we present a thorough analysis of visual explainability methods for contrastive learning, establish their correlation with downstream tasks and demonstrate the potential of our approaches to investigate their merits and drawbacks

arXiv.org e-Print Archive

Deep convolutional networks for magnification of DICOM Brain Images

Author: Sammani Fawaz
Sim Kok Swee
Publication venue: ICIC International
Publication date: 01/04/2019
Field of study

Convolutional neural networks have recently achieved great success in Single Image Super-Resolution (SISR). SISR is the action of reconstructing a high-quality image from a low-resolution one. In this paper, we propose a deep Convolutional Neural Network (CNN) for the enhancement of Digital Imaging and ommunications in Medicine (DICOM) brain images. The network learns an end-to-end mapping between the low and high resolution images. We first extract features from the image, where each new layer is connected to all previous layers. We then adopt residual learning and the mixture of convolutions to reconstruct the image. Our network is designed to work with grayscale images, since brain images are originally in grayscale. We further compare our method with previous works, trained on the same brain images, and show that our method outperforms them

SHDL@MMU Digital Repository

EEG Signal Analysis of Stroke Patients with Applications of Deep Learning

Author: Sammani Fawaz
Sim Kok Swee
Wong Eng Kiong
Publication venue
Publication date: 01/01/2021
Field of study

The methodology section is usually the second-longest section in the abstract. It should contain enough information to enable the reader to understand what was done, and important questions to which the methods section should provide brief answers

SHDL@MMU Digital Repository